Method and apparatus for audio compression

ABSTRACT

A method and apparatus for audio compression receives an audio signal. Transform coding is applied to the audio signal to generate a sequence of transform frequency coefficients. The sequence of transform frequency coefficients is partitioned into a plurality of non-uniform width frequency ranges and then zero value frequency coefficients are inserted at the boundaries of the non-uniform width frequency ranges. As a result, certain of the transform frequency coefficients that represent high frequencies are dropped.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional PatentApplication, Serial No. entitled “Method and Apparatus for AudioCompression” filed Feb. 28, 2003.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The invention relates to the field of data compression. Morespecifically, the invention relates to audio compression.

[0004] 2. Background of the Invention

[0005] To allow typical computing systems to process (e.g., store,transmit, etc.) audio signals, various techniques have been developed toreduce (compress) the amount of data required to represent an audiosignal. In typical audio compression systems, the following steps aregenerally performed: (1) a segment or frame of an audio signal istransformed into a frequency domain; (2) transform coefficientsrepresenting (at least a portion of) the frequency domain are quantizedinto discrete values; and (3) the quantized values are converted (orcoded) into a binary format. The encoded/compressed data can be output,stored, transmitted, and/or decoded/decompressed.

[0006] To achieve relatively high compression/low bit rates (e.g., 8 to16 kbps) for various types of audio signals (e.g., speech, music, etc.),some compression techniques (e.g., CELP, ADPCM, etc.) limit the numberof components in a segment (or frame) of an audio signal which is to becompressed. Unfortunately, such techniques typically do not take intoaccount relatively substantial components of an audio signal. Thus, suchtechniques result in a relatively poor quality synthesized(decompressed) audio signal due to loss of information.

[0007] One method of audio compression that allows relatively highquality compression/decompression involves transform coding (e.g.,discrete cosine transform, Fourier transform, etc.). Transform codingtypically involves transforming an input audio signal using a transformmethod, such as low order discrete cosine transform (DCT). Typically,each transform coefficient of a portion (or frame) of an audio signal isquantized and encoded using any number of well-known coding techniques.Transform compression techniques, such as DCT, generally provide arelatively high quality synthesized signal, since a relatively highnumber of spectral components of an input audio signal are taken intoconsideration.

[0008] Most audio signal compression algorithms are based on transformcoding. Some examples of transform coders include Dolby AC-2, AC-3, MPEGLII and LIII, ATRAC, Sony MiniDisc, and Ogg Vorbis I. These codersemploy modified discrete cosine transfer (MDCT) transforms withdifferent frame lengths and overlap factor.

[0009] Increasing frame length leads to better frequency resolution. Asa result, high compression ratios can be achieved for stationary audiosignals by increasing frame length. However, transform frequencycoefficient quantization errors are spread over the entire length of aframe. The pursuit of higher compression with larger frame lengthresults in “echo”, which appears when sound attacks present in an audiosignal input. This means that frame length, or frequency resolution,should be vary depending on the input audio signals. In particular, thetransform length should be shorter during sound attacks and longer forstationary signals. However, a sound attack may only occupy part of anentire signal bandwidth.

[0010] Large transform length also leads to large computationalcomplexity. Both the number of computations and the dynamic range oftransform coefficients increase if transform length increases, hencehigher computational precision is required. Audio data representationand arithmetic operations must be performed with at least 24 bitprecision if the frame is greater than or equal to 1024 samples, hence16-bit digital signal processing cannot be used for encoding/decodingalgorithms.

[0011] In addition, conventional MDCT provides identical frequencyresolution over an entire signal, even though different frequencyresolutions are appropriate for different frequency ranges. Toaccommodate the perceptual ability of the human ear, higher frequencyresolution is needed for low-frequency ranges and lower frequencyresolution is needed for high-frequency ranges.

[0012] Furthermore, the amplitude transfer function of conventional MDCTis not “flat” enough. There are significant irregularities nearfrequency range boundaries. These irregularities make it difficult touse MDCT coefficients for psycho-acoustic analysis of the audio signaland to compute bit allocation. Conventional audio codecs computeauxiliary spectrum (typically with FFT, which is computationallyexpensive) for constructing a psycho-acoustic model (PAM).

BRIEF SUMMARY OF THE INVENTION

[0013] A method and apparatus for audio compression is described.According to one aspect of the invention, a method and apparatus foraudio compression provides for receiving an audio signal, applyingtransform coding to the audio signal to generate a sequence of transformfrequency coefficients, partitioning the sequence of transform frequencycoefficients into a plurality of non-uniform width frequency ranges,inserting zero value frequency coefficients at the boundaries of thenon-uniform width frequency ranges; and dropping certain of thetransform frequency coefficients that represent high frequencies.

[0014] These and other aspects of the present invention will be betterdescribed with reference to the Detailed Description and theaccompanying Figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] The invention may best be understood by referring to thefollowing description and accompanying drawings that are used toillustrate embodiments of the invention. In the drawings:

[0016]FIG. 1 is an exemplary diagram of an audio encoder with anadaptive non-uniform filterbank according to one embodiment of theinvention.

[0017]FIG. 2 is a block diagram of an exemplary adaptive non-uniformfilterbank according to one embodiment of the invention.

[0018]FIG. 3 is a flowchart for encoding an audio signal input accordingto one embodiment of the invention.

[0019]FIG. 4 is a diagram illustrating exemplary zero value frequencycoefficient stuffing according to one embodiment of the invention.

[0020]FIG. 5 is a block diagram of an exemplary audio encoding unit witha non-uniform frequency range transfer function flattening filterbankand a adaptive sound attack based transform length varying filterbankaccording to one embodiment of the invention.

[0021]FIG. 6 is a block diagram illustrating an exemplary audio decoderaccording to one embodiment of the invention.

[0022]FIG. 7 is a block diagram of an exemplary inverse non-uniformfilterbank according to one embodiment of the invention.

[0023]FIG. 8 is a diagram illustrating removal of boundary frequencycoefficients from frequency ranges according to one embodiment of theinvention.

DETAILED DESCRIPTION OF THE INVENTION

[0024] In the following description, numerous specific details are setforth to provide a thorough understanding of the invention. However, itis understood that the invention may be practiced without these specificdetails. In other instances, well-known circuits, structures, standards,and techniques have not been shown in detail in order not to obscure theinvention.

[0025] Overview

[0026] A method and apparatus for audio compression is described.According to one embodiment of the invention, a method and apparatus foraudio compression generates frequency ranges of non-uniform width (i.e.,the frequency ranges are not all represented by the same number oftransform frequency coefficients) during encoding of an audio inputsignal. Each of these non-uniform frequency ranges is processedseparately, thus reducing the computational complexity of processing theaudio signal represented by the frequency ranges. Partitioning (logicalor actual) a transformed audio signal input into non-uniform frequencyranges also enables utilization of different frequency resolutions basedon the width of a frequency range.

[0027] According to another embodiment of the invention, transformfrequency coefficients at the boundary of each of these frequency rangesare displaced with zero-value frequency coefficients (i.e., thefrequency ranges are stuffed with zeroes at their boundaries). Stuffingzeroes at the boundaries of the frequency ranges provides for aflattened amplitude transfer function that can be used for quantizing,encoding, and psycho-acoustic model (PAM) computing.

[0028] In another embodiment of the invention, normalization andtransforms are performed on a set of non-uniform width frequency rangesbased on their width. Separately processing different width frequencyranges enables scalability and support of multiple sampling rates andmultiple bit rates. Furthermore, separately processing each of a set ofnon-uniform frequency ranges enables modification of time resolutionbased on detection of a sound attack within a particular frequencyrange, independent of the other frequency ranges.

[0029] Decoding an audio signal that has been encoded as described aboveincludes extracting frequency ranges from an encoded audio bitstream andprocessing the frequency ranges separately.

[0030] Encoding an Audio Signal

[0031]FIG. 1 is an exemplary diagram of an audio encoder with anadaptive non-uniform filterbank according to one embodiment of theinvention. In FIG. 1, an adaptive non-uniform filterbank 101 is coupledwith a PAM computing unit 105, a quantization unit 103, and a losslesscoding unit 107. The adaptive non-uniform filterbank 101 is described ata high level in FIG. 1 and will be described in more detail below. Theadaptive non-uniform filterbank 101 receives an audio signal input. Theadaptive non-uniform filterbank 101 processes the received audio signalinput and generates indications of applied transform length,normalization coefficients, transform frequency coefficients, and blocklengths of each frequency range.

[0032] The transform frequency coefficients are processed by theadaptive non-uniform filterbank 101 based on the width of theircorresponding frequency range and multiplexed together before beingtransmitted to the quantization unit 103 and the PAM computing unit 105.The transform frequency coefficients can be sent to both thequantization unit 103 and the PAM computing unit 105 because theadaptive non-uniform filterbank 101 has performed zero stuffing on thetransform frequency coefficients to flatten the amplitude transferfunction. The block lengths sent to the PAM computing unit 105 and thequantization unit 103 indicate the width of each frequency range.

[0033] The normalization coefficients sent from the adaptive non-uniformfilterbank 101 to the lossless coding unit 107 include a normalizationcoefficient for each of the non-uniform width frequency ranges generatedby the adaptive non-uniform filterbank 101. In an alternative embodimentof the invention, the normalization coefficients are transmitted to thequantization unit 103 in addition to or instead of the lossless codingunit 107.

[0034] The adaptive non-uniform filterbank 101 also sends indications ofapplied transform length to the lossless coding unit 107. Theindications of applied transform length indicates whether a short orlong transform was performed on a frequency range. The adaptivenon-uniform filterbank 101 adapts the length of transform performed on afrequency ranges based on presence of a sound attack within a frequencyrange.

[0035]FIG. 2 is a block diagram of an exemplary adaptive non-uniformfilterbank according to one embodiment of the invention. FIG. 3 is aflowchart for encoding an audio signal input according to one embodimentof the invention. FIG. 2 will be described with reference to FIG. 3. InFIG. 2, an adaptive non-uniform filterbank 202 includes a non-uniformfrequency range transform function flattening filterbank 201, anadaptive sound attack based transform length varying filterbank 203, anda sound attack based transform length decision unit 205.

[0036] The non-uniform frequency range transform function flatteningfilterbank 201 is coupled with the adaptive sound attack based transformlength varying filterbank 203. The sound attack based transform lengthdecision unit 205 is also coupled with the adaptive sound attack basedtransform length varying filterbank 203. In FIG. 2, the non-uniformfrequency range transform function flattening filterbank 201 and thesound attack based transform length decision unit 205 both receive anaudio signal input. The sound attack based transform length decisionunit 205 also (or instead) must receive the output of the non-uniformfrequency range transform function flattening filterbank 201 to makeindependent decisions for different subbands. The original time-domainsignal is used to make decisions about the presence of sound attacksover the entire signal.

[0037] Referring to FIG. 3 at block 301, the non-uniform frequency rangetransform function flattening filterbank 201 of FIG. 2 generatesnon-uniform frequency ranges of transform frequency coefficients fromthe audio input signal. At block 203, zero value frequency coefficientsare stuffed at the boundaries of the frequency ranges. At block 205, thetransform frequency coefficients that have been shifted beyond the lastfrequency range because of zero value frequency coefficient stuffing aredropped.

[0038]FIG. 4 is a diagram illustrating exemplary zero value frequencycoefficient stuffing according to one embodiment of the invention. InFIG. 4, a line diagram indicates 320 transform frequency coefficients.The 320 transform frequency coefficients have been partitioned into 5frequency ranges (also referred to as subbands). Frequency ranges 401,403, 405, 407, and 409 respectively include transform frequencycoefficients 1-32, 33-64, 65-128, 128-192, and 193-320. In alternativeembodiments of the invention greater or fewer frequency ranges may begenerated. Also, a greater or fewer number of transform frequencycoefficients may be generated.

[0039] After zero value frequency coefficient stuffing, a different setof frequency ranges are generated. A frequency range 411 includestransform frequency coefficients 1-30 and two zero value frequencycoefficients at the end of the frequency range 411. Frequency ranges413, 415, and 417 each include two zero value frequency coefficients attheir beginning and at their end. Between the boundary zero valuefrequency coefficients, the frequency ranges 413, 415, and 417respectively include transform frequency coefficients 31-58, 59-118, and119-178. The last frequency range 419 includes two zero value frequencycoefficients at the beginning of the range and transform frequencycoefficients 179-304. As illustrated by FIG. 4, stuffing sixteen zerovalue frequency coefficients at the boundaries of the frequency rangeshas resulted in the last sixteen transform frequency coefficients beingshifted out of the last frequency range 419 and dropped. Typically, thefrequency coefficients that are dropped represent frequencies that arenot perceivable by the human ear. Although FIG. 4 has been describedwith reference to stuffing two zero value frequency coefficients at theboundaries of frequency ranges, a lesser number or greater number ofzero value frequency coefficients can be stuffed at the boundaries offrequency ranges.

[0040] As previously stated, displacing transform frequency coefficientsat the boundaries of frequency ranges with zero value frequencycoefficients flattens the amplitude transfer function for therepresented audio signal. Flattening the transfer function enables thesame transform coefficients to be used for PAM construction andquantization and encoding.

[0041] Returning to FIG. 3, normalization coefficients are generatedbased on the zero stuffed non-uniform frequency ranges at block 307. Atblock 309, transform is performed on frequency ranges based on width ofthe frequency range. At block 311, the audio signal and transformfrequency coefficients are analyzed for sounds attacks and the transformlength performed on frequency ranges is varied based on detection of asound attack.

[0042] Referring to FIG. 2, the sounds attack based transform isperformed by the adaptive sound attack based transform length varyingfilterbank 203. The sound attack based transform length decision unit205 of FIG. 2 determines if a sound attack is present in a particularfrequency range and indicates to the adaptive sound attack basedtransform length varying filterbank 203 the appropriate transform lengththat should be applied.

[0043] The sound attack based transform length decision unit 205 iscoupled with a lossless coding unit 211 and sends indications of appliedtransform lengths to the lossless coding unit 211. The adaptive soundattack based transform length varying filterbank 203 is coupled with aquantization unit 209 and a PAM computing unit 207. The adaptive soundattack based transform length varying filterbank 203 sends transformfrequency coefficients and block length to the quantization unit 209 andthe PAM computing unit 207.

[0044] The non-uniform frequency range transfer function flatteningfilterbank 201 is coupled with the lossless coding unit 211. Thenon-uniform frequency range transfer function flattening filterbank 201generates normalization coefficients as described at block 307 in FIG. 3and sends these generated normalization coefficients to the losslesscoding unit 211. In an alternative embodiment of the invention, thenormalization coefficients are sent to the quantization unit 209.

[0045] Partitioning a signal into multiple frequency ranges andprocessing the multiple frequency ranges separately reduces thecomplexity of the encoded audio signal and enables flexibility of thealgorithm.

[0046]FIG. 5 is a block diagram of an exemplary audio encoding unit witha non-uniform frequency range transfer function flattening filterbankand a adaptive sound attack based transform length varying filterbankaccording to one embodiment of the invention. In FIG. 5, a modifieddiscrete cosine transform 640 (MDCT640) unit 501 receives 320 samples.Each time period, 320 samples are receive by the MDCT640 unit 501 andcombined with a previous 320 samples to generate a 640 sample frame. TheMDCT640 unit 501 windows and transforms these 640 samples to obtain 320transform frequency coefficients. The MDCT640 unit 501 then partitionsthe 320 transform frequency coefficients into frequency ranges ofnon-uniform width. These frequency ranges are sent to a zero-stuffingunit 503. The zero-stuffing unit 503 stuffs zero value frequencycoefficients at the boundaries of the frequency ranges and drops thosetransform frequency coefficients shifted out of the last frequencyrange, as previously described.

[0047] After zero-stuffing, the zero-stuffing unit 503 sends eachfrequency range to a different normalization unit. In FIG. 5, the 320transform frequency coefficients have been partitioned into 5 frequencyranges. Each of the frequency ranges is sent to a different one ofnormalization units 505A-505E. The energy and dynamic range of transformfrequency coefficients is different for different frequency ranges.Typically, the average energy in the first frequency range is 50-80 dBlarge than for last frequency range. Normalizing each frequency rangeseparately enables further computations in each frequency range usingrelatively simple fixed-point arithmetic. Each of the normalizationunits 505A-505E generates a normalization coefficient for theircorresponding frequency range, which are sent to the next unit in theencoding process (e.g., the quantization unit). Each normalizedfrequency range then flows into one of a set of inverse MDCT units. InFIG. 5, the first frequency range flows into an IMDCT64 unit 507A andthe second frequency range flows into an IMDCT64 unit 507B. The thirdand fourth frequency ranges respectively flow into IMDCT128 units 507Cand 507D. The fifth frequency range flows into an IMDCT256 unit 507E.Each of the IMDCT units 507A-507E performs on the received normalizedtransform frequency coefficients inverse DCT-IV transform, windowing,and overlapping with previous normalized transform frequencycoefficients. Output from the IMDCT units 507A-507E respectively flowinto MDCT units 509A-509E. Output from the IMDCT units 507A-507E alsoflows into a sound attack based transform length decision unit 504.

[0048] The sound attack based transform length decision unit 504analyzes the raw 640 samples and the frequency ranges from the IMDCTunits 507A-507E to detect sound attacks over the entire frame and/orwithin each frequency range. Based on detection of a sound attack, thesound attack based transform length decision unit 504 indicates to theappropriate MDCT unit the transform length that should be performed on acertain frequency range. The sound attack based transform lengthdecision unit 504 also indicates to a lossless encoding unit the lengthof transform performed.

[0049] To illustrate transform length varying based on sounds attackdetection, processing of the first frequency range received by theMDCT512/128 unit 509A will be explained. If a sound attack is notdetected in the first frequency range, then 256-samples long transformis used. In other words 8 output 32 transform frequency coefficients arecombined to obtain a sequence of length 256. This sequence is coupledwith 256 previous samples to obtain an input frame for length 512 MDCTtransform performed by the MDCT512/128 unit 509A. The MDCT512/128 unit509A will generate 256 transform frequency coefficients. If a soundattack is detected in the first frequency range, then the MDCT512/128unit 509A is switched to short-length mode of functioning. First, atransitional frame of length 256+64=320 is transformed. After thetransitional frame is transformed, short transforms of length 128 areapplied to the first frequency range until a decision is made by thesound attack based transform length decision unit 504 to switch tolong-length transform. Another transitional frame (of length 320) isswitches from short-length to long-length mode. Although in oneembodiment of the invention MDCT units perform short or long lengthtransforms, alternative embodiments of the invention have a greaternumber of modes of transform length. By switching to short transformlength mode, time resolution can be reduced by 4 times during soundattacks or dynamically changing signals in any frequency range.

[0050] The transform frequency coefficients generated by the MDCT units509A-509E are sent to a multiplexer 511. The multiplexer 511 orders thereceived transform frequency coefficients to form a sequence that willbe quantized and losslessly encoded according to a PAM.

[0051] Assuming F_(o) denotes the sampling frequency of an audio signaland the audio signal does not includes sound attacks (i.e., all MDCTunits are functioning in long-length mode), then the maximal frequencyresolution for low frequencies is equal to F_(o)/2/320/8 Hz. Forexample, if F_(o)=44100 Hz, then frequency resolution will be equal to8.6 Hz for the first and second frequency ranges. For the third andfourth frequency ranges their frequency resolution will be equal to 17.2Hz. For the fifth frequency range, the frequency resolution will beequal to 68.9.5 Hz.

[0052] The audio encoder described in the above figures can be appliedto application that require scalability, embedded functioning, and/orsupport of multiple sampling rates and multiple bit rates. For example,assume a 44.1 kHz audio signal input is partitioned into 5 frequencyranges (or subbands). The information transmitted to various users canbe scaled to accommodate particular users. One set of users may receiveall 5 frequency ranges whereas other users may only receive the firstthree frequency ranges (the lower frequency ranges). The two differentsets of users are provided different bit-rates and different signalquality. The audio decoders of the set of users that receive only thelower frequency ranges reconstruct half of the time-domain samples,resulting in a 22.1 kHz signal sampling frequency. If a set of usersonly receive the 1^(st) frequency range (lowest frequency), then thereconstructed signal can be reproduced with a sampling rate of 8 or11.025 kHz.

[0053] Decoding a Zero Stuffed Length Varied Audio Signal

[0054] Decoding a zero stuffed length varied audio signal involvesperforming inverse operations of encoding described above.

[0055]FIG. 6 is a block diagram illustrating an exemplary audio decoderaccording to one embodiment of the invention. A demultiplexer 601receives a bitstream. The demultiplexer 601 is coupled with a losslessdecoder and dequantizer 603 and an inverse non-uniform filterbank 605.The demultiplexer 601 extracts encoded data (quantized and encoded zerostuffed length varied transform frequency coefficients) and bitallocation from the received bitstream and sends them to the losslessdecoder and dequantizer 603. The demultiplexer 601 also extracts framelength from the bitstream and sends the frame length to the losslessdecoder and dequantizer 603 and the inverse non-uniform filterbank 605.The lossless decoder and dequantizer 603 uses the bit allocation and theframe length to decode and dequantize the encoded data received from thedemultiplexer 601. The lossless decoder and dequantizer 603 outputstransform frequency coefficients and normalization coefficients to theinverse non-uniform filterbank 605. The inverse non-uniform filterbank605 processes the transform frequency coefficients and the normalizationcoefficients to generate synthesized audio data.

[0056]FIG. 7 is a block diagram of an exemplary inverse non-uniformfilterbank according to one embodiment of the invention. A demultiplexer701 is coupled with IMDCT units 703A-703E. The IMDCT units 703A-703D areIMDCT 512/128 units. The IMDCT unit 703E is an IMDCT 256/64. Thedemultiplexer 701 receives transform frequency coefficients anddemultiplexes the transform frequency coefficients into frequencyranges. Frequency ranges 1-5 respectively flow to IMDCTunits 703A-703E.All of the IMDCT units 703A-703E also receive frame length. After theIMDCT units 703A-703E perform inverse MDCT on the frequency range(s)that they have received, the outputs from the IMDCT units 703A-703Erespectively flow from to MDCT units 705A-705E. MDCT units 705A-705B areMDCT64 units. MDCT 705C-705D are MDCT128 units. MDCT unit 705E is anMDCT256 unit. The MDCT units 705A-705E are respectively coupled withde-normalization units 707A-707E. Outputs from the MDCT units 705A-705Erespectively flow to the de-normalization units 707A-707E. Thede-normalization units 707A-707E also receive normalizationcoefficients. The de-normalization units 707A-707E de-normalize thetransform frequency coefficients received from the MDCT units 705A-705Eusing the normalization coefficients. The denormalized transformfrequency coefficients flow into a zero-removing unit 709. Thezero-removing unit 709 modifies the frequency ranges by removingboundary frequency coefficients that were originally zero valuefrequency coefficients.

[0057]FIG. 8 is a diagram illustrating removal of boundary frequencycoefficients from frequency ranges according to one embodiment of theinvention. In FIG. 8, frequency ranges 801, 803, 805, 807, and 809respectively include transform frequency coefficients 1-32, 33-64,65-128, 129-192, and 193-320. In the example illustrated in FIG. 8, thefollowing transform frequency coefficients were originally zero valuefrequency coefficients: 31-34, 63-66, 127-130, and 191-194. Afterremoval of boundary frequency coefficients, the resulting frequencyranges 811, 813, 815, 817, and 819 respectively include the followingfrequency coefficients: 1-32, 35, 36; 37-60, 65-72; 73-126, 131-140;141-190, 195-208; and 209-304. In addition to transform frequencycoefficients 209-304, the frequency range 819, which corresponds to thefrequency range 809, also includes zero value frequency coefficients asthe frequency coefficients 305-320.

[0058] Returning to FIG. 7, the zero-removing unit 709 passes themodified frequency ranges to an IMDCT640 unit 711. After performinginverse MDCT on the frequency ranges, the IMDCT640 unit 711 outputssynthesized audio data.

[0059] The audio encoder and decoder described above includes memories,processors, and/or ASICs. Such memories include a machine-readablemedium on which is stored a set of instructions (i.e., software)embodying any one, or all, of the methodologies described herein.Software can reside, completely or at least partially, within thismemory and/or within the processor and/or ASICs. For the purpose of thisspecification, the term “machine-readable medium” shall be taken toinclude any mechanism that provides (i.e., stores and/or transmits)information in a form readable by a machine (e.g., a computer). Forexample, a machine-readable medium includes read only memory (“ROM”),random access memory (“RAM”), magnetic disk storage media, opticalstorage media, flash memory devices, electrical, optical, acoustical, orother form of propagated signals (e.g., carrier waves, infrared signals,digital signals, etc.), etc.

ALTERNATIVE EMBODIMENTS

[0060] While the invention has been described in terms of severalembodiments, those skilled in the art will recognize that the inventionis not limited to the embodiments described. For instance, while theflow diagrams show a particular order of operations performed by certainembodiments of the invention, it should be understood that such order isexemplary (e.g., alternative embodiments may perform the operations in adifferent order, combine certain operations, overlap certain operations,etc.). In addition, while embodiments of the invention have beendescribed with reference to MDCT and IMDCT, alternative embodiments ofthe invention utilize other transform coding techniques.

[0061] Thus, the method and apparatus of the invention can be practicedwith modification and alteration within the spirit and scope of theappended claims. The description is thus to be regarded as illustrativeinstead of limiting on the invention.

We claim:
 1. A method for audio compressing comprising: receiving anaudio signal; applying transform coding to the audio signal to generatea sequence of transform frequency coefficients; partitioning thesequence of transform frequency coefficients into a plurality ofnon-uniform width frequency ranges; inserting zero value frequencycoefficients at the boundaries of the non-uniform width frequencyranges; and dropping certain of the transform frequency coefficientsthat represent high frequencies.
 2. The method of claim 1 furthercomprising separately applying a transform to each of the plurality ofnon-uniform width frequency ranges.
 3. The method of claim 2 whereinapplication of the transform is in parallel.
 4. The method of claim 1further comprising varying length of transform operations applied toeach of the plurality of non-uniform width frequency ranges.
 5. Themethod of claim 1 wherein the number of dropped transform frequencycoefficients is equal to the number of inserted zero value frequencycoefficients.
 6. The method of claim 1 further comprising: constructinga psycho-acoustic model with the plurality of non-uniform widthfrequency ranges with inserted zero value frequency coefficients; andquantizing the plurality of non-uniform width frequency ranges withinserted zero value frequency coefficients.
 7. A method for audiocompression comprising: generating a plurality of frequency coefficientsrepresenting an audio signal; grouping the plurality of frequencycoefficients into frequency ranges of non-uniform width; determining ifa sound attack occurs in any one of the non-uniform width frequencyranges; and performing transform length switching separately on each ofthe frequency ranges based on determining occurrence of a sound attack.8. The method of claim 7 further comprising stuffing zeros at theboundaries of the non-uniform width frequency ranges and droppingcertain of the plurality of frequency coefficients that represent higherend frequencies.
 9. The method of claim 8 wherein stuffing zeros at theboundaries comprises: insert zeros at the boundaries of the frequencyranges; and shifting those of the plurality of frequency coefficientsthat are displaced by the,, inserted zeros into the next frequencyrange.
 10. The method of claim 7 further comprising separatelyperforming transforms on each of the plurality of non-uniform widthfrequency ranges based on their width.
 11. The method of claim 10wherein the transforms are inverse modified discrete cosine transforms.12. The method of claim 7 wherein the performed long and shorttransforms are modified discrete cosine transforms.
 13. A method foraudio compression comprising: applying a transform to a plurality ofaudio samples to generate a sequence of transform frequencycoefficients; and partitioning the sequence of transform frequencycoefficients into varying width frequency subbands with zero valuefrequency coeffcients at the boundaries of the frequency subbands. 14.The method of claim 13 further comprising dropping a set of one or moretransform frequency coefficients in the highest frequency subband. 15.The method of claim 14 wherein the number of dropped transform frequencycoefficients corresponds to the number of zero value frequencycoefficients stuffed at the boundaries of the frequency subbands. 16.The method of claim 13 further comprising: constructing apsycho-acoustic model with the varying width subbands; and quantizingthe varying width subbands.
 17. The method of claim 13 furthercomprising applying transforms of varying length to each of the varyingwidth subbands.
 18. A method for audio compression comprising:partitioning an audio input into a plurality of non-uniform frequencysubbands, each of the plurality of non-uniform frequency subbandsincluding a set of one or more frequency coefficients; displacing thoseof the set of frequency coefficients at the boundary of each subbandwith zeros; and dropping those of the set of frequency coefficients thatfall outside of the plurality of frequency subbands after thedisplacing.
 19. The method of claim 18 further comprising separatelyapplying a transform to each of the plurality of non-uniform frequencysubbands.
 20. The method of claim 19 wherein application of thetransform is in parallel.
 21. The method of claim 18 further comprisingvarying length of transform operations applied to each of the pluralityof non-uniform frequency subbands.
 22. The method of claim 18 whereinthe number of dropped frequency coefficients is equal to the number ofinserted zeros.
 23. The method of claim 18 further comprising:constructing a psycho-acoustic model with the plurality of non-uniformfrequency subbands; and quantizing the plurality of non-uniformfrequency subbands.
 24. A method for audio compression comprising:generating a plurality of non-uniform frequency subbands, each of theplurality of non-uniform frequency subbands including a set of one ormore frequency coefficients, from an audio input signal; displacingthose of the set of frequency coefficients at the boundary of eachnon-uniform frequency subband with zeros; separately normalizing thenon-uniform frequency subbands, including the zeros; varying transformlength applied to each of the plurality of non-uniform frequencysubbands based on the detection of a sound attack within the pluralityof non-uniform frequency subbands; and multiplexing the plurality ofnon-uniform frequency subbands.
 25. The method of claim 24 whereininverse modified discrete transform is applied to the plurality ofnon-uniform frequency subbands after normalizing.
 26. The method ofclaim 24 wherein the varied transform is modified discrete cosinetransform.
 27. A method for audio decompression comprising: receiving abitstream; extracting a sequence of transform frequency coefficientsfrom the bitstream; demultiplexing the sequence of transform frequencycoefficients into a plurality of frequency ranges; removing boundarytransform frequency coefficients that were originally zeros from theplurality of frequency ranges; shifting the remaining transformfrequency coefficients to fill in for the removed boundary transformfrequency coefficients; and inserting zeros into vacancies in the higherrange of the plurality of frequency ranges caused by said shifting. 28.The method of claim 27 further comprising applying inverse modifieddiscrete cosing transform to the plurality of frequency ranges.
 29. Themethod of claim 27 further comprising decoding and dequantizing thesequence of transform frequency coefficients.
 30. A method for audiocompression comprising: partitioning an audio signal into a plurality ofnon-uniform width frequency ranges, each of the plurality of non-uniformwidth frequency ranges including a set of one or more transformfrequency coefficients; indicating the width of each of the plurality offrequency ranges; separately processing each of the plurality ofnon-uniform width frequency ranges; and encoding the plurality offrequency ranges and their width indications.
 31. The method of claim 30further comprising separately performing transform length switching onone of the plurality of frequency ranges based on detection of a soundattack within the one of the plurality of frequency ranges.
 32. Themethod of claim 30 further comprising: stuffing zeros at the boundariesof the plurality of frequency ranges; shifting those transform frequencycoefficients displaced by the stuffed zeros; and dropping thosetransform frequency coefficients that fall outside of the plurality offrequency ranges from said shifting.
 33. The method of claim 30 whereinthe processing comprises normalizing and transforming.
 34. The method ofclaim 33 wherein the transforming is modified discrete cosinetransforming.
 35. An apparatus comprising: an adaptive non-uniformfilterbank to represent an audio input with a number of transformfrequency coefficients that is less than the audio input's number ofsamples; a quantization unit coupled with the adaptive non-uniformfilterbank, the quantization unit to receive transform frequencycoefficients from adaptive non-uniform filterbank; and a losslessencoding unit coupled with the quantization unit, the lossless encodingunit to receive quantized transform coefficients from the quantizationunit.
 36. The apparatus of claim 35 wherein the adaptive non-uniformfilterbank comprises: a non-uniform frequency range transform functionflattening filterbank to partition a sequence of transform frequencycoefficients generated from the audio input into frequency ranges ofnon-uniform width and to flatten a transfer function of the sequence oftransform frequency coefficients; an adaptive sound attack basedtransform length varying filterbank coupled with the non-uniformfrequency range transform function flattening filterbank; a sound attackdetection unit coupled with the adaptive sound attack based transformlength varying filterbank; and a multiplexer coupled with the adaptivesound attack based transform length varying filterbank.
 37. Theapparatus of claim 36 wherein the non-uniform frequency range transformfunction flattening filterbank comprises: a modified discrete cosinetransform unit; a frequency range boundary zero stuffing unit coupledwith the transform unit; and a plurality of parallel inverse modifieddiscrete cosine transform units coupled with the frequency rangeboundary zero stuffing unit.
 38. The apparatus of claim 36 wherein theadaptive sound attack based transform length varying filterbankcomprises a plurality of parallel multi-length transform units.
 39. Theapparatus of claim 35 further comprising a psych-acoustic modelcomputing unit coupled with the adaptive non-uniform filterbank and thequantization unit.
 40. An apparatus comprising: a non-uniform frequencyrange transform function flattening filterbank to receive an audiosignal, to partition the audio signal into varying frequency ranges offrequency coefficients, and to perform zero bit stuffing at theboundaries of the frequency ranges and to drop certain high frequencycoefficients; a sound attack detection unit coupled with the non-uniformfrequency range transform function flattening filterbank, the soundattack detection unit to locate sound attacks within the audio signal;an adaptive sound attack based transform length varying filterbankcoupled with the non-uniform frequency range transform functionflattening filterbank and the sound attack detection unit, the adaptivesounds attack based transform length varying filterbank to performvarying length transforms on the audio signal based on sound attackdetection indicated by the sound attack detection unit; a multiplexercoupled with the adaptive sound attack based transform length varyingfilterbank; a quantization unit coupled with the multiplexer; apysco-acoustic model (PAM) computing unit coupled with the multiplexer;and a lossless coding unit coupled with the quantization unit and thePAM computing unit, the lossless coding unit to losslessly codetransform coefficients received from the quantization unit.
 41. Theapparatus of claim 40 wherein the non-uniform frequency range transformfunction flattening filterbank comprises: a modified discrete cosinetransform unit; a frequency range boundary zero stuffing unit coupledwith the transform unit; and a plurality of parallel inverse modifieddiscrete cosine transform units coupled with the frequency rangeboundary zero stuffing unit.
 42. The apparatus of claim 40 wherein theadaptive sound attack based transform length varying filterbankcomprises a plurality of parallel multi-length transform units.
 43. Anaudio decoder comprising: a demultiplexer to receive a bitstream and toextract a sequence of transform frequency coefficients; and an inverseadaptive non-uniform filterbank coupled with the demultiplexer, theinverse adaptive non-uniform filterbank to partition a sequence oftransform frequency coefficients into a plurality of non-uniform widthfrequency ranges, to remove certain boundary transform frequencycoefficients originally based on zeros, and to insert zeros forpreviously removed high range transform frequency coefficients.
 44. Theaudio decoder of claim 43 wherein the inverse adaptive non-uniformfilterbank includes: a plurality of parallel inverse modified discretecosine transform units; a plurality of parallel modified discrete cosinetransform units coupled with the plurality of parallel inverse modifieddiscrete cosine transform units; a plurality of parallelde-normalization units coupled with the plurality of parallel modifieddiscrete cosine transform units; a zero removing unit coupled with theplurality of de-normalization units; and an inverse modified discretecosine transform unit coupled with the zero removing unit.
 45. The audiodecoder of claim 43 further comprising a decoder and dequanztizer unitcoupled with the demultiplexer.
 46. A machine-readable medium having aset of instruction stored thereon, which when executed by a set of oneor more processors causes the set of processors to perform theoperations comprising: receiving an audio signal; applying transformcoding to the audio signal to generate a sequence of transform frequencycoefficients; partitioning the sequence of transform frequencycoefficients into a plurality of non-uniform width frequency ranges;inserting zero value frequency coefficients at the boundaries of thenon-uniform width frequency ranges; and dropping certain of thetransform frequency coefficients that represent high frequencies. 47.The machine-readable medium of claim 46 further comprising separatelyapplying a transform to each of the plurality of non-uniform widthfrequency ranges.
 48. The machine-readable medium of claim 47 whereinapplication of the transform is in parallel.
 49. The machine-readablemedium of claim 46 further comprising varying length of transformoperations applied to each of the plurality of non-uniform widthfrequency ranges.
 50. The machine-readable medium of claim 46 whereinthe number of dropped transform frequency coefficients is equal to thenumber of inserted zero value frequency coefficients.
 51. Themachine-readable medium of claim 46 further comprising: constructing apsycho-acoustic model with the plurality of non-uniform width frequencyranges with inserted zero value frequency coefficients; and quantizingthe plurality of non-uniform width frequency ranges with inserted zerovalue frequency coefficients.
 52. A machine-readable medium having a setof instruction stored thereon, which when executed by a set of one ormore processors causes the set of processors to perform the operationscomprising: generating a plurality of frequency coefficientsrepresenting an audio signal; grouping the plurality of frequencycoefficients into frequency ranges of non-uniform width; determining ifa sound attack occurs in any one of the non-uniform width frequencyranges; and performing short transforms on those non-uniform frequencyranges that have a sound attack and long transforms on those non-uniformfrequency ranges that do not have a sound attack.
 53. Themachine-readable medium of claim 52 further comprising stuffing zeros atthe boundaries of the non-uniform width frequency ranges and droppingcertain of the plurality of frequency coefficients that represent higherend frequencies.
 54. The machine-readable medium of claim 53 whereinstuffing zeros at the boundaries comprises: insert zeros at theboundaries of the frequency ranges; and shifting those of the pluralityof frequency coefficients that are displaced by the inserted zeros intothe next frequency range.
 55. The machine-readable medium of claim 52further comprising separately performing transforms on each of theplurality of non-uniform width frequency ranges based on their width.56. The machine-readable medium of claim 55 wherein the transforms areinverse modified discrete cosine transforms.
 57. The machine-readablemedium of claim 52 wherein the performed long and short transforms aremodified discrete cosine transforms.
 58. A machine-readable mediumhaving a set of instruction stored thereon, which when executed by a setof one or more processors causes the set of processors to perform theoperations comprising: applying a transform to a plurality of audiosamples to generate a sequence of transform frequency coefficients; andpartitioning the sequence of transform frequency coefficients intovarying width frequency subbands with zero value frequency coeffcientsat the boundaries of the frequency subbands.
 59. The machine-readablemedium of claim 58 further comprising dropping a set of one or moretransform frequency coefficients in the highest frequency subband. 60.The machine-readable medium of claim 59 wherein the number of droppedtransform frequency coefficients corresponds to the number of zero valuefrequency coefficients stuffed at the boundaries of the frequencysubbands.
 61. The machine-readable medium of claim 58 furthercomprising: constructing a psycho-acoustic model with the varying widthsubbands; and quantizing the varying width subbands.
 62. Themachine-readable medium of claim 58 further comprising applyingtransforms of varying length to each of the varying width subbands. 63.A machine-readable medium having a set of instruction stored thereon,which when executed by a set of one or more processors causes the set ofprocessors to perform the operations comprising: partitioning an audioinput into a plurality of non-uniform frequency subbands, each of theplurality of non-uniform frequency subbands including a set of one ormore frequency coefficients; displacing those of the set of frequencycoefficients at the boundary of each subband with zeros; and droppingthose of the set of frequency coefficients that fall outside of theplurality of frequency subbands after the displacing.
 64. Themachine-readable medium of claim 63 further comprising separatelyapplying a transform to each of the plurality of non-uniform frequencysubbands.
 65. The machine-readable medium of claim 64 whereinapplication of the transform is in parallel.
 66. The machine-readablemedium of claim 63 further comprising varying length of transformoperations applied to each of the plurality of non-uniform frequencysubbands.
 67. The machine-readable medium of claim 63 wherein the numberof dropped frequency coefficients is equal to the number of insertedzeros.
 68. The machine-readable medium of claim 63 further comprising:constructing a psycho-acoustic model with the plurality of non-uniformfrequency subbands; and quantizing the plurality of non-uniformfrequency subbands.
 69. A machine-readable medium having a set ofinstruction stored thereon, which when executed by a set of one or moreprocessors causes the set of processors to perform the operationscomprising: generating a plurality of non-uniform frequency subbands,each of the plurality of non-uniform frequency subbands including a setof one or more frequency coefficients, from an audio input signal;displacing those of the set of frequency coefficients at the boundary ofeach non-uniform frequency subband with zeros; separately normalizingthe non-uniform frequency subbands, including the zeros; varyingtransform length applied to each of the plurality of non-uniformfrequency subbands based on the detection of a sound attack within theplurality of non-uniform frequency subbands; and multiplexing theplurality of non-uniform frequency subbands.
 70. The machine-readablemedium of claim 69 wherein inverse modified discrete transform isapplied to the plurality of non-uniform frequency subbands afternormalizing.
 71. The machine-readable medium of claim 69 wherein thevaried transform is modified discrete cosine transform.
 72. Amachine-readable medium having a set of instruction stored thereon,which when executed by a set of one or more processors causes the set ofprocessors to perform the operations comprising: receiving a bitstream;extracting a sequence of transform frequency coefficients from thebitstream; demultiplexing the sequence of transform frequencycoefficients into a plurality of frequency ranges; removing boundarytransform frequency coefficients that were originally zeros from theplurality of frequency ranges; shifting the remaining transformfrequency coefficients to fill in for the removed boundary transformfrequency coefficients; and inserting zeros into vacancies in the higherrange of the plurality of frequency ranges caused by said shifting. 73.The machine-readable medium of claim 72 further comprising applyinginverse modified discrete cosing transform to the plurality of frequencyranges.
 74. The machine-readable medium of claim 72 further comprisingdecoding and dequantizing the sequence of transform frequencycoefficients.
 75. A machine-readable medium having a set of instructionstored thereon, which when executed by a set of one or more processorscauses the set of processors to perform the operations comprising:partitioning an audio signal into a plurality of non-uniform widthfrequency ranges, each of the plurality of non-uniform width frequencyranges including a set of one or more transform frequency coefficients;indicating the width of each of the plurality of frequency ranges;separately processing each of the plurality of non-uniform widthfrequency ranges; and encoding the plurality of frequency ranges andtheir width indications.
 76. The machine-readable medium of claim 75further comprising separately performing transform length switching onone of the plurality of frequency ranges based on detection of a soundattack within the one of the plurality of frequency ranges.
 77. Themachine-readable medium of claim 75 further comprising: stuffing zerosat the boundaries of the plurality of frequency ranges; shifting thosetransform frequency coefficients displaced by the stuffed zeros; anddropping those transform frequency coefficients that fall outside of theplurality of frequency ranges from said shifting.
 78. Themachine-readable medium of claim 75 wherein the processing comprisesnormalizing and transforming.
 79. The machine-readable medium of claim78 wherein the transforming is modified discrete cosine transforming.